Labeled Morphological Segmentation with Semi-Markov Models
نویسندگان
چکیده
We present labeled morphological segmentation—an alternative view of morphological processing that unifies several tasks. We introduce a new hierarchy of morphotactic tagsets and CHIPMUNK, a discriminative morphological segmentation system that, contrary to previous work, explicitly models morphotactics. We show improved performance on three tasks for all six languages: (i) morphological segmentation, (ii) stemming and (iii) morphological tag classification. For morphological segmentation our method shows absolute improvements of 2-6 points F1 over a strong baseline.
منابع مشابه
IMAGE SEGMENTATION USING GAUSSIAN MIXTURE MODEL
Stochastic models such as mixture models, graphical models, Markov random fields and hidden Markov models have key role in probabilistic data analysis. In this paper, we have learned Gaussian mixture model to the pixels of an image. The parameters of the model have estimated by EM-algorithm. In addition pixel labeling corresponded to each pixel of true image is made by Bayes rule. In fact, ...
متن کاملImage Segmentation using Gaussian Mixture Model
Abstract: Stochastic models such as mixture models, graphical models, Markov random fields and hidden Markov models have key role in probabilistic data analysis. In this paper, we used Gaussian mixture model to the pixels of an image. The parameters of the model were estimated by EM-algorithm. In addition pixel labeling corresponded to each pixel of true image was made by Bayes rule. In fact,...
متن کاملUnsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models
This paper describes an unsupervised dynamic graphical model for morphological segmentation and bilingual morpheme alignment for statistical machine translation. The model extends Hidden Semi-Markov chain models by using factored output nodes and special structures for its conditional probability distributions. It relies on morpho-syntactic and lexical source-side information (part-of-speech, m...
متن کاملSemi-supervised extensions to Morfessor Baseline
We have extended Morfessor Baseline, which is a well-known method for unsupervised morphological segmentation, to semi-supervised learning. As submission to Morpho Challenge 2010, we provide results from three methods: The first one is based on the unsupervised algorithm, but includes a weight parameter that can be used to control the amount of segmentation. The second one applies the semisuper...
متن کاملSemi-Supervised Learning for Natural Language
Statistical supervised learning techniques have been successful for many natural language processing tasks, but they require labeled datasets, which can be expensive to obtain. On the other hand, unlabeled data (raw text) is often available “for free” in large quantities. Unlabeled data has shown promise in improving the performance of a number of tasks, e.g. word sense disambiguation, informat...
متن کامل